Extraction of Semantic Information from Web Resources

نویسنده

  • J. Dědek
چکیده

The paper addresses a problem of extraction of semantic information from Czech texts from the Web. The method described in this paper exploits existing linguistic tools created originally for a syntactically annotated corpus, Prague Dependency Treebank (PDT 2.0). We are working on development of a system which captures text of web-pages, annotates it linguistically by linguistic tools, extracts data and interprets the extracted data semantically in terms of web ontologies. The proposed extraction method is based on extraction rules – tree queries, which are adopted from the Netgraph application. Semantic interpretation of these rules provides semantics of the extracted data. We present some initial experiments in the domain of reports of traffic accidents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Annotation sémantique des ressources Web : Etat de l’art et perspectives de recherche

The semantic annotation problem of Web resources interest researchers from different communities to improve the information retrieval process. Indeed, semantic annotation of documents and Web pages is a hard work. An automation of construction process is essential to modernize information retrieval in the Semantic Web. In this paper, we present a new semantic annotation approach of Web resource...

متن کامل

Semantic Based Information Extraction from Web

Extraction of information from web is a challenging task. The information stored in a web may be structured or unstructured information. The structured information provides enhanced knowledge which helps to retrieve relevant documents. It helps the user to understand particular domain. This paper explores the importance of information extraction using semantics. It enables the users to discover...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Towards Cross-Media Feature Extraction

In this paper we describe past and present work dealing with the use of textual resources, out of which semantic information can be extracted in order to provide for semantic annotation and indexing of associated image or video material. Since the emergence of semantic web technologies and resources, entities, relations and events extracted from textual resources by means of Information Extract...

متن کامل

Information Extraction from Wikipedia Using Pattern Learning

In this paper we present solutions for the crucial task of extracting structured information from massive free-text resources, such as Wikipedia, for the sake of semantic databases serving upcoming Semantic Web technologies. We demonstrate both a verb frame-based approach using deep natural language processing techniques with extraction patterns developed by human knowledge experts and machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008